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Abstract 

In a companion paper it was shown that the class of constant-depth determinate fc-ary 
recursive clauses is efficiently learnable. In this paper we present negative results showing 
that any natural generalization of this class is hard to learn in Valiant's model of pac- 
learnability. In particular, we show that the following program classes are cryptographically 
hard to learn: programs with an unbounded number of constant-depth linear recursive 
clauses; programs with one constant-depth determinate clause containing an unbounded 
number of recursive calls; and programs with one linear recursive clause of constant locality. 
These results immediately imply the non-learnability of any more general class of programs. 
We also show that learning a constant-depth determinate program with either two linear 
recursive clauses or one linear recursive clause and one non-recursive clause is as hard as 
learning boolean DNF. Together with positive results from the companion paper, these 
negative results establish a boundary of efficient learnability for recursive function-free 
clauses. 

1. Introduction 

Inductive logic programming (ILP) (Muggleton, 1992; Muggleton k De Raedt, 1994) is 
an active area of machine learning research in which the hypotheses of a learning system 
are expressed in a logic programming language. While many different learning problems 
have been considered in ILP, including some of great practical interest (Muggleton, King, 
k Sternberg, 1992; King, Muggleton, Lewis, k Sternberg, 1992; Zelle k Mooney, 1994; 
Cohen, 1994b), a class of problems that is frequently considered is to reconstruct simple 
list-processing or arithmetic functions from examples. A prototypical problem of this sort 
might be learning to append two lists. Often, this sort of task is attempted using only 
randomly-selected positive and negative examples of the target concept. 

Based on its similarity to the problems studied in the field of automatic programming 
from examples (Summers, 1977; Biermann, 1978), we will (informally) call this class of 
learning tasks automatic logic programming problems. While a number of experimental 
systems have been built (Quinlan k Cameron- Jones, 1993; Aha, Lapointe, Ling, k Matwin, 
1994), the experimental success in automatic logic programming systems has been limited. 
One common property of automatic logic programming problems is the presence of recur- 
sion. The goal of this paper is to explore by analytic methods the computational limitations 
on learning recursive programs in Valiant's model of pac-learnability (1984). (In brief, this 
model requires that an accurate approximation of the target concept be found in polyno- 
mial time using a polynomial-sized set of labeled examples, which are chosen stochastically.) 
While it will surprise nobody that such limitations exist, it is far from obvious from previous 
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research where these limits lie: there are few provably fast methods for learning recursive 
logic programs, and even fewer meaningful negative results. 

The starting point for this investigation is a series of positive learnability results appear- 
ing in a companion paper (Cohen, 1995). These results show that a single constant-depth 
determinate clause with a constant number of "closed" recursive calls is pac-learnable. They 
also show that a two-clause constant-depth determinate program consisting of one nonre- 
cursive clause and one recursive clause of the type described above is pac-learnable, if some 
additional "hints" about the target concept are provided. 

In this paper, we analyze a number of generalizations of these learnable languages. We 
show that that relaxing any of the restrictions leads to difficult learning problems: in par- 
ticular, learning problems that are either as hard as learning DNF (an open problem in 
computational learning theory), or as hard as cracking certain presumably secure crypto- 
graphic schemes. The main contribution of this paper, therefore, is a delineation of the 
boundaries of learnability for recursive logic programs. 

The paper is organized as follows. In Section 2 we define the classes of logic programs and 
the learnability models that are used in this paper. In Section 3 we present cryptographic 
hardness results for two classes of constant-depth determinate recursive programs: programs 
with n linear recursive clauses, and programs with one ra-ary recursive clause. We also 
analyze the learnability of clauses of constant locality, another class of clauses that is pac- 
learnable in the nonrecursive case, and show that even a single linearly recursive local 
clause is cryptographically hard to learn. We then turn, in Section 4, to the analysis of 
even more restricted classes of recursive programs. We show that two different classes of 
constant-depth determinate programs are prediction-equivalent to boolean DNF: the class 
of programs containing a single linear recursive clause and a single nonrecursive clause, and 
the class of programs containing two linearly recursive clauses. Finally, we summarize the 
results of this paper and its companion, discuss related work, and conclude. 

Although this paper can be read independently of its companion paper we suggest that 
readers planning to read both papers begin with the companion paper (Cohen, 1995). 

2. Background 

For completeness, we will now present the technical background needed to state our results; 
however, aside from Sections 2.2 and 2.3, which introduce polynomial predictability and 
prediction-preserving reducibilities, respectively, this background closely follows that pre- 
sented in the companion paper (Cohen, 1995). Readers are encouraged to skip this section 
if they are already familiar with the material. 

2.1 Logic Programs 

We will assume that the reader has some familiarity in logic programming (such as can 
be obtained by reading one of the standard texts (Lloyd, 1987).) Our treatment of logic 
programs differs only in that we will usually consider the body of a clause to be an ordered 
set of literals. We will also consider only logic programs without function symbols — i.e., 
programs written in Datalog. 

The semantics of a Datalog program P will be defined relative to to a database, DB, 
which is a set of ground atomic facts. (When convenient, we will also think of DB as a 
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conjunction of ground unit clauses). In particular, we will interpret P and DB as a subset 
of the set of all extended instances. An extended instance is a pair (f,D) in which the 
instance fact f is a ground fact, and the description D is a set of ground unit clauses. An 
extended instance (/, D) is covered by (P, DB) iff 

DB A D A P \~ f 

If extended instances are allowed, then function-free programs can encode many com- 
putations that are usually represented with function symbols. For example, a function-free 
program that tests to see if a list is the append of two other lists can be written as follows: 

Program P: 

append(Xs,Ys,Ys) <— 

null(Xs). 
append(Xs,Ys,Zs) <— 

components(Xs,X,Xsl) A 

components(Zs,X,Zsl) A 

append(Xsl,Ys,Zsl). 

Database DB: 

null (nil). 

Here the predicate components ( A, B,C) means that A is a list with head B and tail C; thus 
an extended instance equivalent to append([l,2],[3],[l,2,3]) would have the instance fact 
/ = append(listl2 , UstS, Ustl23) and a description containing these atoms: 

component s(list 12,1, list 2), component s(list 2, 2, nil), 
component s(list 123,1, list 23), component s(list 23, 2, list 3), 
component s(list 3, 3, nil) 

The use of extended instances and function-free programs is closely related to "flattening" 
(Rouveirol, 1994; De Raedt & Dzeroski, 1994); some experimental learning systems also 
impose a similar restriction (Quinlan, 1990; Pazzani & Kibler, 1992). Another motivation 
for using extended instances is technical. Under the (sometimes quite severe) syntactic 
restrictions considered in this paper, there are often only a polynomial number of possible 
ground facts — i.e., the Herbrand base is polynomial. Hence if programs were interpreted 
in the usual model-theoretic way it would be possible to learn a program equivalent to any 
given target by simply memorizing the appropriate subset of the Herbrand base. However, 
if programs are interpreted as sets of extended instances, such trivial learning algorithms 
become impossible; even for extremely restricted program classes there are still an expo- 
nential number of extended instances of size n. Further discussion can be found in the 
companion paper (Cohen, 1995). 

Below we will define some of the terminology for logic programs that will be used in this 
paper. 

2.1.1 Input/Output Variables 

If A<—B\ A ... A B r is an (ordered) definite clause, then the input variables of the literal B{ 
are those variables which also appear in the clause A<—B\ A ... A -B;-i; all other variables 
appearing in B{ are called output variables. 
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2.1.2 Types of Recursion 

A literal in the body of a clause is a recursive literal if it has the same predicate symbol 
and arity as the head of the clause. If every clause in a program has at most one recursive 
literal, the program is linear recursive. If every clause in a program has at most k recursive 
literals, the program is k-ary recursive. If every recursive literal in a program contains no 
output variables, the program is closed recursive. 

2.1.3 Depth 

The depth of a variable appearing in a (ordered) clause A<—B\ A. . .AB r is defined as follows. 
Variables appearing in the head of a clause have depth zero. Otherwise, let B{ be the first 
literal containing the variable V, and let d be the maximal depth of the input variables of 
Bi] then the depth of V is d+1. The depth of a clause is the maximal depth of any variable 
in the clause. 

2.1.4 Determinacy 

The literal B{ in the clause A<—B\ A . . . AB r is determinate iff for every possible substitution 
a that unifies A with some fact e such that 

DB h B\a A ...AB^a 

there is at most one maximal substitution 9 so that DB h B{od. A clause is determinate 
if all of its literals are determinate. Informally, determinate clauses are those that can be 
evaluated without backtracking by a Prolog interpreter. 

The term ij -determinate (Muggleton k Feng, 1992) is sometimes used for programs that 
are depth i, determinate, and contain literals of arity j or less. A number of experimen- 
tal systems exploit restrictions associated with limited depth and determinacy (Muggleton 
k Feng, 1992; Quinlan, 1991; Lavrac k Dzeroski, 1992; Cohen, 1993c). The learnabil- 
ity of constant-depth determinate clauses has also received some formal study (Dzeroski, 
Muggleton, k Russell, 1992; Cohen, 1993a). 

2.1.5 Mode Constraints and Declarations 

Mode declarations are commonly used in analyzing Prolog code or describing Prolog code; 
for instance, the mode declaration " components( + , — , — )" indicates that the predicate com- 
ponents can be used when its first argument is an input and its second and third arguments 
are outputs. Formally, we define the mode of a literal L appearing in a clause C to be a 
string s such that the initial character of s is the predicate symbol of L, and for j > 1 
the j-th character of s is a "+" if the (j — l)-th argument of L is an input variable and a 
" — " if the (j — l)-th argument of L is an output variable. (This definition assumes that all 
arguments to the head of a clause are inputs; this is justified since we are considering only 
how clauses behave in classifying extended instances, which are ground.) A mode constraint 
is a set of mode strings R = {si, . . . , s^}, and a clause C is said to satisfy a mode constraint 
R for p if for every literal L in the body of C, the mode of L is in R. 

We define a declaration to be a tuple (p, a', R) where p is a predicate symbol, a' is an 
integer, and R is a mode constraint. We will say that a clause C satisfies a declaration if 
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the head of C has arity a' and predicate symbol p, and if for every literal L in the body of 
C the mode of L appears in R. 

2.1.6 Determinate Modes 

In a typical setting, that facts in the database DB and extended instances are not arbitrary: 
instead, they are representative of some "real" predicate, which may obey certain restric- 
tions. Let us assume that all database and extended-instance facts will be drawn from some 
(possibly infinite) set T . Informally, a mode is determinate if the input positions of the 
facts in T functionally determine the output positions. Formally, if / = p(t\, ...,/&) is a 
fact with predicate symbol p and pa is a mode, then define inputs(f,pa) to be iti 1 , . . . , ti k ), 
where i\, i\. are the indices of a containing a "+", and define outputs(f,pa) to be 
(/jj , . . . ,tj t ), where ji, j\ are the indices of a containing a " — ". We define a mode 
string pa for a predicate p to be determinate for T iff 

{(inputs(f,pa), outputs(f,pa)) : / £ J 7 } 

is a function. Any clause that satisfies a declaration Dec £ VetVEC must be determinate. 

The set of all declarations containing only modes determinate for T will be denoted 
VetVEC^. Since in this paper the set T will be assumed to be fixed, we will generally omit 
the subscript. 

2.1.7 Bounds on Predicate Arity 

We will use the notation a-VB for the set of all databases that contain only facts of arity 
a or less, and a-VEC for the set of all declarations (p, a', R) such that every string s £ R is 
of length a + 1 or less. 

2.1.8 Size Measures 

The learning models presented in the following section will require the learner to use re- 
sources polynomial in the size of its inputs. Assuming that all predicates are arity a or 
less for some constant a allows very simple size measures to be used. In this paper, we will 
measure the size of a database DB by its cardinality; the size of an extended instance (/, D) 
by the cardinality of D; the size of a declaration (p, a', R) by the cardinality of R; and the 
size of a clause A<—B\ A . . . A B r by the number of literals in its body. 

2.2 A Model of Learnability 

2.2.1 Preliminaries 

Let X be a set. We will call X the domain, and call the elements of X instances. Define a 
concept C over X to be a representation of some subset of X, and define a language Lang 
to be a set of concepts. In this paper, we will be rather casual about the distinction between 
a concept and the set it represents; when there is a risk of confusion we will refer to the 
set represented by a concept C as the extension of C . Two sets C\ and C'2 with the same 
extension are said to be equivalent. Define an example of C to be a pair (e, b) where b = 1 if 
e £ C and 6 = otherwise. If D is a probability distribution function, a sample of C from 
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X drawn according to D is a pair of multisets S + , S~ drawn from the domain X according 
to D, S + containing only positive examples of C, and S~ containing only negative ones. 

Associated with X and Lang are two size complexity measures, for which we will use 
the following notation: 

• The size complexity of a concept C G Lang is written ||C||. 

• The size complexity of an instance e G X is written ||e||. 

• If S is a set, S n stands for the set of all elements of S of size complexity no greater 
than n. For instance, X n = {e G X : ||e|| < n} and Lang„ = {C G Lang : ||C|| < n}. 

We will assume that all size measures are polynomially related to the number of bits needed 
to represent C or e; this holds, for example, for the size measures for logic programs and 
databases defined above. 

2.2.2 Polynomial Predictability 

We now define polynomial predictability as follows. A language Lang is polynomially 
predictable iff there is an algorithm PacPredict and a polynomial function m{^, n e , n t ) 
so that for every n t > 0, every n e > 0, every C G LANG nt , every e : < e < 1, every 
S : < 8 < 1, and every probability distribution function D, PacPredict has the following 
behavior: 

1. given a sample S + , S~ of C from X ne drawn according to D and containing at least 
ra(^,^,n e ,nt) examples, PacPredict outputs a hypothesis H such that 

Prob(D(H -C) + D(C - H) > e) < 6 

where the probability is taken over the possible samples S + and S~ and (if PacPredict 
is a randomized algorithm) over any coin flips made by PacPredict; 

2. PacPredict runs in time polynomial in ^, ^, ra e , n t , and the number of examples; 
and 

3. The hypothesis H can be evaluated in polynomial time. 

The algorithm PacPredict is called a prediction algorithm for Lang, and the func- 
tion ra(^,^,n e ,nt) is called the sample complexity of PacPredict. We will sometimes 
abbreviate "polynomial predictability" as "predictability". 

The first condition in the definition merely states that the error rate of the hypothesis 
must (usually) be low, as measured against the probability distribution D from which the 
training examples were drawn. The second condition, together with the stipulation that the 
sample size is polynomial, ensures that the total running time of the learner is polynomial. 
The final condition simply requires that the hypothesis be usable in the very weak sense 
that it can be used to make predictions in polynomial time. Notice that this is a worst case 
learning model, as the definition allows an adversarial choice of all the inputs of the learner. 
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2.2.3 Relation to Other Models 

The model of polynomial predictability has been well-studied (Pitt & Warmuth, 1990), and 
is a weaker version of Valiant's (1984) criterion of pac-learnability . A language Lang is 
pac-learnable iff there is an algorithm PacLearn so that 

1. PacLearn satisfies all the requirements in the definition of polynomial predictability, 
and 

2. on inputs S + and S~ , PacLearn always outputs a hypothesis H G Lang. 

Thus if a language is pac-learnable it is predictable. 

In the companion paper (Cohen, 1995), our positive results are all expressed in the model 
of identifiability from equivalence queries, which is strictly stronger than pac-learnability; 
that is, anything that is learnable from equivalence queries is also necessarily pac-learnable. 1 
Since this paper contains only negative results, we will use the the relatively weak model 
of predictability. Negative results in this model immediately translate to negative results 
in the stronger models; if a language is not predictable, it cannot be pac-learnable, nor 
identifiable from equivalence queries. 

2.2.4 Background Knowledge in Learning 

In a typical ILP system, the setting is slightly different, as the user usually provides clues 
about the target concept in addition to the examples, in the form of a database DB of 
"background knowledge" and a set of declarations. To account for these additional inputs it 
is necessary to extend the framework described above to a setting where the learner accepts 
inputs other than training examples. Following the formalization used in the companion 
paper (Cohen, 1995), we will adopt the notion of a "language family". 

If Lang is a set of clauses, DB is a database and Dec is a declaration, we will define 
Lang[_D5, Dec] to be the set of all pairs (C, DB) such that C G Lang and C satisfies Dec. 
Semantically, such a pair will denote the set of all extended instances (f,D) covered by 
(C, DB). Next, if VB is a set of databases and VEC is a set of declarations, then define 

Lang[£>£,£>£C] = {Lang[_D5, Dec] : DB G VB and Dec G VEC) 

This set of languages is called a language family. 

We will now extend the definition of predictability queries to language families as follows. 
A language family Lang[X>£>, VEC] is polynomially predictable iff every language in the set 
is predictable. A language family Lang[X>£>, VEC] is polynomially predictable iff there is 
a single algorithm Identify(_D5, Dec) that predicts every Lang[_D5, Dec] in the family 
given DB and Dec. 

The usual model of polynomial predictability is worst-case over all choices of the target 
concept and the distribution of examples. The notion of polynomial predictability of a 
language family extends this model in the natural way; the extended model is also worst- 
case over all possible choices for database DB G VB and Dec G VEC. This worst-case 

1. An equivalence query is a question of the form "is H equivalent to the target concept?" which is answered 
with either "yes" or a counterexample. Identification by equivalence queries essentially means that the 
target concept can be exactly identified in polynomial time using a polynomial of such queries. 
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model may seem unintuitive, since one typically assumes that the database DB is provided 
by a helpful user, rather than an adversary. However, the worst-case model is reasonable 
because learning is allowed to take time polynomial in the size of smallest target concept 
in the set Lang[_D5, Dec]] this means that if the database given by the user is such that 
the target concept cannot be encoded succinctly (or at all) learning is allowed to take more 
time. 

Notice that for a language family Lang[_D5, Dec] to be polynomially predictable, every 
language in the family must be polynomially predictable. Thus to show that a family is not 
polynomially predictable it is sufficient to construct one language in the family for which 
learning is hard. The proofs of this paper will all have this form. 

2.3 Prediction-Preserving Reducibilities 

The principle technical tool used in our negative results in the notion of prediction-preserving 
reducibility , as introduced by Pitt and Warmuth (1990). Prediction-preserving reducibilities 
are a method of showing that one language is no harder to predict than another. Formally, 
let LANGi be a language over domain X\ and LANG2 be a language over domain Xi- 
We say that predicting LANGi reduces to predicting LANG2, denoted LANGi <l LANG2, if 
there is a function fi : X\ X2, henceforth called the instance mapping, and a function 
f c : LANGi — ► LANG2, henceforth called the concept mapping, so that the following all hold: 

1. x G C if and only if fi{x) G fc(C) — i.e., concept membership is preserved by the 
mappings; 

2. the size complexity of f c (C) is polynomial in the size complexity of C — i.e., the size 
of concept representations is preserved within a polynomial factor; 

3. fi(x) can be computed in polynomial time. 

Note that f c need not be computable; also, since fi can be computed in polynomial time, 
fi(x) must also preserve size within a polynomial factor. 

Intuitively, f c {C\) returns a concept C'2 G LANG2 that will "emulate" C\ — i.e., make 
the same decisions about concept membership — on examples that have been "preprocessed" 
with the function fi. If predicting LANGi reduces to predicting LANG2 and a learning 
algorithm for LANG2 exists, then one possible scheme for learning concepts from LANGi 
would be the following. First, convert any examples of the unknown concept C\ from 
the domain X\ to examples over the domain X2 using the instance mapping fi. If the 
conditions of the definition hold, then since C\ is consistent with the original examples, 
the concept f c {C\) will be consistent with their image under fi] thus running the learning 
algorithm for LANG2 should produce some hypothesis H that is a good approximation of 
fc{C\). Of course, it may not be possible to map H back into the original language LANGi, 
as computing / c _1 may be difficult or impossible. However, H can still be used to predict 
membership in C\: given an example x from the original domain X\, one can simply predict 
x G C\ to be true whenever fi(x) G H . 

Pitt and Warmuth (1988) give a more rigorous argument that this approach leads to a 
prediction algorithm for LANGi, leading to the following theorem. 
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Theorem 1 (Pitt and Warmuth) issi/rae LANGi <j LANG2. Then if LANGi is not poly- 
nomially predictable, LANG2 is not polynomially predictable. 

3. Cryptographic Limitations on Learning Recursive Programs 

Theorem 1 allows one to transfer hardness results from one language to another. This is 
useful because for a number of languages, it is known that prediction is as hard as breaking 
cryptographic schemes that are widely assumed to be secure. For example, it is known 
that predicting the class of languages accepted by deterministic finite state automata is 
"cryptographically hard", as is the class of languages accepted by log-space bounded Turing 
machines. 

In this section we will make use of Theorem 1 and previous cryptographic hardness 
results to show that certain restricted classes of recursive logic programs are hard to learn. 

3.1 Programs With n Linear Recursive Clauses 

In a companion paper (Cohen, 1995) we showed that a single linear closed recursive clause 
was identifiable from equivalence queries. In this section we will show that a program with 
a polynomial number of such clauses is not identifiable from equivalence queries, nor even 
polynomially predictable. 

Specifically, let us extend our notion of a "family of languages" slightly, and let 
DLocfra, s] represent the language of log-space bounded deterministic Turing machines with 
up to s states accepting inputs of size n or less, with the usual semantics and complexity 
measure. 2 Also let c?-DepthLinRecProg denote the family of logic programs containing 
only depth-c? linear closed recursive clauses, but containing any number of such clauses. We 
have the following result: 

Theorem 2 For every n and s, there exists a database DB ntS £ 1-VB and declaration 
Dec ntS £ 1-VetVSC of sizes polynomial in n and s such that 

DLoc[ra,s] <j l-DEPTHLiNR,EcPROG[_D5 njS , Dec n)S ] 

Hence for d > 1 and a > 1, c?-DepthLinRecProg[X>,E>, a-D etD8C\ is not uniformly poly- 
nomially predictable under cryptographic assumptions. 3 

Proof: Recall that a log-space bounded Turing machine (TM) has an input tape of length 
n, a work tape of length log 2 n which initially contains all zeros, and a finite state control 
with state set Q. To simplify the proof, we assume without loss of generality that the tape 
and input alphabets are binary, that there is a single accepting state qf £ Q, and that the 
machine will always erase its work tape and position the work tape head at the far left after 
it decides to accept its input. 

At each time step, the machine will read the tape squares under its input tape head and 
work tape head, and based on these values and its current state q, it will 

2. I.e., a machine represents the set of all inputs that it accepts, and its complexity is the number of states. 

3. Specifically, this language is not uniformly polynomially predictable unless all of the following crypto- 
graphic problems can be solved in polynomial time: solving the quadratic residue problem, inverting the 
RSA encryption function, and factoring Blum integers. This result holds because all of these crypto- 
graphic problems can be reduced to learning DLOG Turing machines (Kearns & Valiant, 1989). 
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• write either a 1 or a on the work tape, 

• shift the input tape head left or right, 

• shift the work tape head left or right, and 

• transition to a new internal state q' 

A deterministic machine can thus be specified by a transition function 

6 : {0,1} X {0,1} X Q — > {0,1} X {L,R} x {L,R} x Q 

Let us define the internal configuration of a TM to consist of the string of symbols 
written on the worktape, the position of the tape heads, and the internal state q of the 
machine: thus a configuration is an element of the set 

CON = {0,l} log2n X {l,...,log 2 n} x {l,...,n} X Q 

A simplified specification for the machine is the transition function 

6' : {0, 1} X CON -+ CON 

where the component {0, 1} represents the contents of the input tape at the square below 
the input tape head. 

Notice that for a machine whose worktape size is bounded by log n, the cardinality of 
CON is only p = |Q|ra 2 log 2 n, a polynomial in n and s = \Q\. We will use this fact in our 
constructions. 

The background database DB n<s is as follows. First, for i = 0, . . .,p, an atom of the 
form coni(ci) is present. Each constant c 4 - will represent a different internal configuration of 
the Turing machine. We will also arbitrarily select c\ to represent the (unique) accepting 
configuration, and add to DB n<s the atom accepting (ci). Thus 

DB n<s = {con l {c i )Y l=1 U {accepting^)} 

Next, we define the instance mapping. An instance in the Turing machine's domain is 
a binary string X = b\ . . . b n ; this is mapped by fi to the extended instance (/, D) where 

/ = accepting(co) 

D = {truei} bieX: bi=i U {falsei} bieX: bi=o 

The description atoms have the effect of defining the predicate truei to be true iff the i-th 
bit of X is a "1", and the defining the predicate falsei to be true iff the i-th bit of X is 
"0". The constant cq will represent the start configuration of the Turing machine, and the 
predicate accepting ( C) will be defined so that it is true iff the Turing machine accepts input 
X starting from state C 

We will let Bec ntS = (accepting, 1, R) where R contains the modes con,-( + ) and con,-( — ), 
for i = 1, . . . ,p; and truej and false j for j = 1, . . . , n. 

Finally, for the concept mapping f c , let us assume some arbitrary one-to-one mapping 
■q between the internal configurations of a Turing machine M and the predicate names 
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cono,. . . ,con p _i such that the start configuration (0 log2 n , 1, go) maps to couq and the ac- 
cepting configuration (0 log2 n , 1, qj) maps to con\. We will construct the program f c (M) 
as follows. For each transition 8'(l,c) — ► c' in 8', where c and c' are in CON, construct a 
clause of the form 

accepting(C) <— conj(C) A true; A con.,/ (CI) A accepting(Cl). 

where i is the position of the input tape head which is encoded in c, corij = i](c), and 
corijt = i](c r ). For each transition 8'(0,c) — ► (c') in 8' construct an analogous clause, in 
which truei is replaced with falsei. 

Now, we claim that for this program P, the machine M will accept when started in 
configuration c 4 - iff 

DB n<s A D A P h accepting(ci) 

and hence that this construction preserves concept membership. This is perhaps easi- 
est to see by considering the action of a top-down theorem prover when given the goal 
accepting(C): the sequence of subgoals accepting (ci), accepting(ci+i), . . .generated by the 
theorem-prover precisely parallel the sequence of configurations c 4 -, . . .entered by the Turing 
machine. 

It is easily verified that the size of this program is polynomial in n and s, and that the 
clauses are linear recursive, determinate, and of depth one, completing the proof. ■ 

There are number of ways in which this result can be strengthened. Precisely the 
same construction used above can be used to reduce the class of nondeterministic log-space 
bounded Turing machines to the constant-depth determinate linear recursive programs. 
Further, a slight modification to the construction can be used to reduce the class of log-space 
bounded alternating Turing machines (Chandra, Kozen, & Stockmeyer, 1981) to constant- 
depth determinate 2-ary recursive programs. The modification is to emulate configurations 
corresponding to universal states of the Turing machine with clauses of the form 

accepting(C) <— 

conj(C) A true; A 
conj!/(Cl) A accepting(Cl) A 
conj 2 '(C2) A accepting(C2). 

where corijt and corij^i are the two successors to the universal configuration corij. This is 
a very strong result, since log-space bounded alternating Turing machines are known to be 
able to perform every polynomial-time computation. 

3.2 Programs With One ra-ary Recursive Clause 

We will now consider learning a single recursive clause with arbitrary closed recursion. 
Again, the key result of this section is an observation about expressive power: there is 
a background database that allows every log-space deterministic Turing machine M to 
be emulated by a single recursive constant-depth determinate clause. This leads to the 
following negative predictability result. 
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Theorem 3 For every n and s, there exists a database DB ntS £ 3-VB and declaration 
Dec ntS £ 3-VetVSC of sizes polynomial in n and s such that 

DLoc[ra,s] <j 3-DeptrRec[DB n)S , Dec n)S ] 

Hence for d > 3 and a > 3, d-T>EPTEREc[DB n , a-VetVSC] is not uniformly polynomially 
predictable under cryptographic assumptions. 

Proof: Consider a DLOG machine M. As in the proof of Theorem 2, we assume without 
loss of generality that the tape alphabet is {0, 1}, that there is a unique starting configura- 
tion Co, and that there is a unique accepting configuration c\. We will also assume without 
loss of generality that there is a unique "failing" configuration Cf a n; and that there is exactly 
one transition of the form 

for every combination of i £ {1, . . . ,n}, b £ {0, 1}, and Cj £ CON — {ci,Cf a n}. Thus on 
input X = x\...x n the machine M starts with CONFIG=co, then executes transitions 
until it reaches CONFIG=ci or CONFIG=cy aJ 7, at which point X is accepted or rejected 
(respectively). We will use p for the number of configurations. (Recall that p is polynomial 
in n and s.) 

To emulate M, we will convert an example X = b\ . . .b n into the extended instance 
fi(X) = (f,D) where 

/ = accepting(co) 
D = {biti(bi)}? =1 

Thus the predicate biti(X) binds X to the i-th bit of the TM's input tape. We also will 
define the following predicates in the background database DB ntS . 

• For every possible b £ {0, 1} and j : 1 < j < p(n), the predicate status^j (B,C, Y) will 
be defined so that given bindings for variables B and C, status^j (B,C,Y) will fail if 
C = Cf a ii; otherwise it will succeed, binding Y to active if B = b and C = Cj and 
binding Y to inactive otherwise. 

• For j ' : 1 < j ' < p(n), the predicate nextj (Y,C) will succeed iff Y can be bound to 
either active or inactive. If Y = ©, then C will be bound to cj; otherwise, C will be 
bound to the accepting configuration c\. 

• The database also contains the fact accepting (ci). 

It is easy to show that the size of this database is polynomial in n and s. 

The declaration Dec ntS is defined to be ( accepting, 1, R) where R includes the modes 
statusf ) j( + , +, — ), nextj( + , — ), and biti(-) for b £ {0, 1}, j = 1, . . . ,p, and i = 1, . . . , n. 

Now, consider the transition rule 8'(b,Cj) — ► c'j, and the corresponding conjunction 

TRANS^j = bit 8 (B 8 6j) A status 6j (C,B 86j ,Y j6j ) A next J /(Y 86j ,Cl 86j ) A accepting(Cl 86j ) 
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Given DB ntS and D, and assuming that C is bound to some configuration c, this conjunction 
will fail if c = Cf a n. It will succeed if j;,- / i or c / cj; in this case Y^j will be bound to 
inactive, Cluj will be bound to ci, and the recursive call succeeds because accepting (ci) is in 
DB ntS . Finally, if X{ = b and c = Cj, TRANS;6j will succeed only if the atom accepting (cji ) 
is provable; in this case, Y^j will be bound to active and Cln,j will be bound to Cji. 
From this it is clear that the clause f c (M) below 

accepting(C) <- /\ TRANS j6j 

ie{i,...,n}, 6e{o,i} 
j'e{i,...,p} 

will correctly emulate the machine M on examples that have been preprocessed with the 
function fi described above. Hence this construction preserves concept membership. It is 
also easily verified that the size of this program is polynomial in n and s, and that the 
clause is determinate and of depth three. ■ 



3.3 One A;-Local Linear Closed Recursive Clause 

So far we have considered only one class of extensions to the positive result given in the 
companion paper (Cohen, 1995) — namely, relaxing the restrictions imposed on the recursive 
structure of the target program. Another reasonable question to ask is if linear closed 
recursive programs can be learned without the restriction of constant-depth determinacy. 

In earlier papers (Cohen, 1993a, 1994a, 1993b) we have studied the conditions under 
which the constant-depth determinacy restriction can be relaxed while still allowing learn- 
ability for nonrecursive clauses. It turns out that most generalizations of constant-depth 
determinate clauses are not predictable, even without recursion. However, the language of 
nonrecursive clauses of constant locality is a pac-learnable generalization of constant-depth 
determinate clauses. Below, we will define this language, summarize the relevant previous 
results, and then address the question of the learnability of recursive local clauses. 

Define a variable V appearing in a clause C to be free if it appears in the body of C but 
not the head of C . Let V\ and V2 be two free variables appearing in a clause. V\ touches V2 
if they appear in the same literal, and V\ influences V2 if it either touches V2, or if it touches 
some variable V3 that influences V2. The locale of a free variable V is the set of literals that 
either contain V, or that contain some free variable influenced by V. Informally, variable 
V\ influences variable V2 if the choice of a binding for V\ can affect the possible choices of 
bindings for Vi- 

The locality of a clause is the size of its largest locale. Let £;-LocalNonRec denote the 
language of nonrecursive clauses with locality k or less. (That is, £;-LocalNonRec is the 
set of logic programs containing a single nonrecursive &-local clause.) The following facts 
are known (Cohen, 1993b): 

• For fixed k and a, the language family £;-LocALNoNREc[a-X',E>, a-VSC] is uniformly 
pac-learnable. 

• For every constant d, every constant a, every database DB £ a-VB, every declaration 
Dec e a-VetVSC , and every clause C £ £?-DepthNonRec[D5, Dec], there is an 
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equivalent clause C" in A;-LocalNonRec[_D_B, Dec] of size bounded by &|C|, where k 
is a function only of a and d (and hence is a constant if d and a are also constants.) 

Hence 

&-LocalNonRec[£>£, a-VSC] 
is a pac-learnable generalization of 

<i- Depth No nRec[X>£>, a-VetVEC] 

It is thus plausible to ask if recursive programs of A;-local clauses are pac-learnable. Some 
facts about the learnability of A;-local programs follow immediately from previous results. 
For example, an immediate consequence of the construction of Theorem 2 is that programs 
with a polynomial number of linear recursive A;-local clauses are not predictable for k > 2. 
Similarly, Theorem 3 shows that a single recursive A;-local clause is not predictable for k > 4. 

It is still reasonable to ask, however, if the positive result for bounded-depth determinate 
recursive clauses (Cohen, 1995) can be extended to k-ary closed recursive A;-local clauses. 
Unfortunately, we have the following negative result, which shows that even linear closed 
recursive clauses are not learnable. 

Theorem 4 Let Dfa[s] denote the language of deterministic finite automata with s states, 
and let £;-LocalLinRec be the set of linear closed recursive k-local clauses. For any con- 
stant s there exists a database DB S £ 3-VB and a declaration Dec s £ 3-VSC, both of size 
polynomial in s, such that 

Dfa[s] <j 3-LocalLinRec[_D5 s , Dec s ] 

Hence for k > 3 and a > 3, k-LocALLlNREc[a-VB , Dec] is not uniformly polynomially 
predictable under cryptographic assumptions. 

Proof: Following Hopcroft and Ullman (1979) we will represent a DFA M over the alphabet 
X as a tuple (qo,Q,F,S) where go is the initial state, Q is the set of states, F is the set of 
accepting states, and 8 : Q X X — ► Q is the transition function (which we will sometimes 
think of as a subset of Q X S X Q). To prove the theorem, we need to construct a database 
DB S of size polynomial in s such that every s-state DFA can be emulated by a linear 
recursive A;-local clause over DB S . 

Rather than directly emulating M, it will be convenient to emulate instead a modifica- 
tion of M. Let M be a DFA with state set Q = Q U {<?(_!), q e , <?/}, where <7(-i), q e an( i Qf 
are new states not found in Q. The initial state of M is <7(-i)- The only final state of M is 
qf. The transition function of M is 

6 = S U go), (q e , c, qf)} U (J {(q l , b, q e )} 

q.eF 

where a, 6, and c are new letters not in S. Note that M is now a DFA over the alphabet 
S U {a,6,c}, and, as described, need not be a complete DFA over this alphabet. (That 
is, there may be pairs (qi,a) such that 8(qi,a) is undefined.) However, M can be easily 
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made complete by introducing an additional rejecting state q r , and making every undefined 
transition lead to q r . More precisely, let 8' be defined as 

8' = 8 U {(qi,x,q r ) \ q t G Q A x G S U {a, b, c} A (flqj ■ (qi, x, q 3 ) G 8)} 

Thus M' = (q(-i),QU{q r },{<lf}, s ') is a "completed" version of M, with Q' = Qu{q r }. We 
will use M' in the construction below; we will also let Q' = Q U {g r } and E' = EU {a, 6, c}. 

Examples of M, M and M' are shown in Figure 1. Notice that aside from the arcs into 
and out of the rejecting state q r , the state diagram of M' is nearly identical to that of M. 
The differences are that in M' there is a new initial state <?(_i) with a single outgoing arc 
labeled a to the old initial state qo; also every final state of M has in M' an outgoing arc 
labeled b to a new state q e , which in turn has a single outgoing arc labeled c to the final 
state qf. It is easy to show that 

x G L(M) iff axbc G L(M') 

Now, given a set of states Q 1 we define a database DB that contains the following 
predicates: 

• arc qit(Ttq j (S,X, T) is true for any S G Q', any T G Q' , and any X G S', unless 5 = 
X = a, and T ^ q,j. 

• state (S) is true for any S G Q'. 

• accept(c,nil,q e ,qf) is true. 

As motivation for the arc predicates, observe that in emulating M' it is clearly useful to be 
able to represent the transition function 8'. The usefulness of the arc predicates is that any 
transition function 8' can be represented using a conjunction of arc literals. In particular, 
the conjunction 

A arC q l ,(J,q J {S, X, T) 

succeeds when 8'(S,X) = T, and fails otherwise. 

Let us now define the instance mapping fi as fi(x) = (f,D) where 

/ = accepted, xbc, <7(_i), qo) 

and D is a set of facts that defines the components relation on the list that corresponds to 
the string xbc. In other words, if x = a\ . . . cr n , then D is the set of facts 

components((Ti . . . a n bc, o\,oi - ■ ■ cr n bc) 
components(<72 . . . o n bc, 02, 03 . . . o n bc) 

component s(c,c, nil) 

The declaration Dec n will be Dec n = (accept, 4, R) where R contains the modes 
components( + , — , — ), state(-), and arc ?j)CT)? .( + , +, +) for qt, qj in Q' , and a G 
Finally, define the concept mapping f c (M) for a machine M to be the clause 
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accept(X,Ys,S,T) <— 

A components(Ys,Xl,Ysl) A state(U) A accept(Xl,Ysl,T,U). 

where 8' is the transition function for the corresponding machine M' defined above. It is 
easy to show this construction is polynomial. 

In the clause X is a letter in Ys is a list of such letters, and S and T are both states 
in Q' . The intent of the construction is that the predicate accept will succeed exactly when 
(a) the string XYs is accepted by M' when M' is started in state S , and (b) the first action 
taken by M' on the string XYs is to go from state S to state T. 

Since all of the initial transitions in M' are from <?(_i) to go on input a, then if the 
predicate accept has the claimed behavior, clearly the proposed mapping satisfies the re- 
quirements of Theorem 1. To complete the proof, therefore, we must now verify that the 
predicate accept succeeds iff XYs is accepted by M' in state S with an initial transition to 
T. 

From the definition of DFAs the string XYs is accepted by M' in state S with an initial 
transition to T iff one of the following two conditions holds. 

• 8'(S,X) = T, Ys is the empty string and T is a final state of M' , or; 

• 8'(S,X) = T, Ys is a nonempty string (and hence has some head XI and some tail 
Ysl) and Ysl is accepted by M' in state T, with any initial transition. 

The base fact accept(c,nil,q e ,qf) succeeds precisely when the first case holds, since in 
M' this transition is the only one to a final state. In the second case, the conjunction of the 
arc conditions in the f c (M) clause succeeds exactly when 8(S,X) = T (as noted above). 
Further the second conjunction in the clause can be succeeds when Ys is a nonempty string 
with head XI and tail Ysl and XlYsl is accepted by M' in state T with initial transition 
to any state U, which corresponds exactly to the second case above. 

Thus concept membership is preserved by the mapping. This completes the proof. ■ 



4. DNF-Hardness Results for Recursive Programs 

To summarize previous results for determinate clauses, it was shown that while a single 
k-Suiy closed recursive depth-c? clause is pac-learnable (Cohen, 1995), a set of n linear closed 
recursive depth-c? clauses is not; further, even a single ra-ary closed recursive depth-c? clauses 
is not pac-learnable. There is still a large gap between the positive and negative results, 
however: in particular, the learnability of recursive programs containing a constant number 
of k-Suiy recursive clauses has not yet been established. 

In this section we will investigate the learnability of these classes of programs. We will 
show that programs with either two linear closed recursive clauses or one linear closed re- 
cursive clause and one base case are as hard to learn as boolean functions in disjunctive 
normal form (DNF). The pac-learnability of DNF is a long-standing open problem in com- 
putational learning theory; the import of these results, therefore, is that establishing the 
learnability of these classes will require some substantial advance in computational learning 
theory. 
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4.1 A Linear Recursive Clause Plus a Base Clause 

Previous work has established that two-clause constant-depth determinate programs con- 
sisting of one linear recursive clause and one nonrecursive clause can be identified, given 
two types of oracles: the standard equivalence- query oracle, and a "basecase oracle' (Cohen, 
1995). (The basecase oracle determines if an example is covered by the nonrecursive clause 
alone.) In this section we will show that in the absence of the basecase oracle, the learning 
problem is as hard as learning boolean DNF. 

In the discussion below, DNF[ra,r] denotes the language of r-term boolean functions in 
disjunctive normal form over n variables. 

Theorem 5 Let c?-Depth-^-Clause be the set of 2-clause programs consisting of one 
clause in c?-DepthLinRec and one clause in c?-DepthNonRec. Then for any n and 
any r there exists a database DB n<r £ 2-VB and a declaration Dec n<r G 2-VEC, both of sizes 
polynomial in n and r, such that 

DNF[ra,r] <j 1-Bepte-2-Clavse[DB n) , r , Dec n) , r ] 

Hence for a > 2 and d > 1 the language family d-T>EPTE-2-CLAVSE[VB, a-VetVSC] is 
uniformly polynomially predictable only if DNF is polynomially predictable. 

Proof: We will produce a DB n , r £ VB and Dec n , r G 2-VetVSC such that predicting 
DNF can be reduced to predicting l-DEPTH-2-CLAUSE[_D5 njr , Dec n{r \. The construction 
makes use of a trick first used in Theorem 3 of (Cohen, 1993a), in which a DNF formula is 
emulated by a conjunction containing a single variable Y which is existentially quantified 
over a restricted range. 

We begin with the instance mapping f{. An assignment r/ = b\ . . .b n will be converted 
to the extended instance (f,D) where 

/ = Kl) 

d = {btt t (bi)}u 

Next, we define the database DB n , r to contain the binary predicates truei, falsei, . . . , true r , 
false r which behave as follows: 

• truei (X, Y) succeeds if A = 1, or if Y £ {1, . . . , r} — {i}. 

• falsei (X, Y) succeeds if A = 0, or if Y G {1, . . . , r} — {i}. 

Further, DB n<r contains facts that define the predicate succ(Y,Z) to be true whenever 
Z = Y + 1, and both Y and Z are numbers between 1 and r. Clearly the size of DB n<r is 
polynomial in r. 

Let Dec n<r = (p, 1, R) where R contains the modes biti(-), for i = 1, . . . , n; truej( + , +) 
and false + , +), for j = 1, . . . , r, and swcc( + , — ). 

Now let (f> be an r-term DNF formula (f> = V[ =1 A*L 1 Uj over the variables v\, . . .,v n . 
We may assume without loss of generality that (f> contains exactly r terms, since any DNF 
formula with fewer than r terms can be padded to exactly r terms by adding terms of the 
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Background database: 

for i = 1 , . . . , r 

true 8 (&, y) for all 6, y : b = 1 or y £ {1, . . . , r} but y ^ i 

false 8 (&, y) for all 6, y : b = or y £ {1, . . . , r} but y ^ i 
succ(y,z) if z = y + 1 and y £ {1, . . . , r} and z £ {1, . . . , r} 

DNF formula: (vi Av$ Av 4 ) V (vJA^) V (f i A ^) 

Equivalent program: 

p(Y) ^succ(Y,Z)Ap(Z). 

p(Y) ^biti(Xi) A bit 2 (X 2 ) A bit 3 (X 3 ) A bit 4 (X 4 ) A 
truei(Xi,Y) A falsei(X 3 ,Y) A truei(X 4 ,Y) A 
false 2 (X 2 ,Y) A false 2 (X 3 ,Y)A 
true 3 (Xi,Y) A false 3 (X 4 ,Y). 

Instance mapping: / 4 -(1011) = (p(l), {bit\(l), 6«^ 2 (0), bit^(l), bit^l)}) 

Figure 2: Reducing DNF to a recursive program 

form v\V\. We now define the concept mapping f c ((f>) to be the program Cr,Cb where Cr 
is the linear recursive depth 1 determinate clause 

p(Y)^succ(Y,Z) A p(Z) 

and C'b is the nonrecursive depth 1 determinate clause 

n r s t 

p(Y)^ /\ bit k (X k ) A /\ /\ Bij 

k=l i=l j = l 

where B;;, is defined as follows: 




true t (X k , Y) if l tj = v k 
false, (X k , Y) if l tJ = 



An example of the construction is shown in Figure 2; we suggest that the reader refer 
to this figure at this point. The basic idea behind the construction is that first, the clause 
C'b will succeed only if the variable Y is bound to i and the i-th term of (f> succeeds (the 
definitions of irue^ and false^ are designed to ensure that this property holds); second, the 
recursive clause Cr is constructed so that the program f c ((f>) succeeds iff C'b succeeds with 
Y bound to one of the values 1, . . . , n. 

We will now argue more rigorously for the correctness of the construction. Clearly, fi(i]) 
and f c ((f>) are of the same size as r] and (f> respectively. Since DB n<r is also of polynomial 
size, this reduction is polynomial. 

Figure 3 shows the possible proofs that can be constructed with the program f c ((f>); 
notice that the program f c ((f>) succeeds exactly when the clause C'b succeeds for some value 
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v(i) 




succ(n-l,n)'P( n ) 
B(n) 

Figure 3: Space of proofs possible with the program f c ((f>) 

of Y between 1 and r. Now, if (f> is true then some term T 4 - = AjLi Uj must be true; in 
this case AjLi Bij succeeds with Y bound to the value i and Aj=i Bi'j f° r everv i' ^ i also 
succeeds with Y bound to i. On the other hand, if (f> is false for an assignment, then each T 4 - 
fails, and hence for every possible binding of Y generated by repeated use of the recursive 
clause Cr the base clause C'b will also fail. Thus concept membership is preserved by the 
mapping. 

This concludes the proof. ■ 
4.2 Two Linear Recursive Clauses 

Recall again that a single linear closed recursive clause is identifiable from equivalence 
queries (Cohen, 1995). A construction similar to that used in Theorem 5 can be used to 
show that this result cannot be extended to programs with two linear recursive clauses. 

Theorem 6 Let c?-Depth-^-Clause' be the set of 2-clause programs consisting of two 
clauses in c?-DepthLinRec. (Thus we assume that the base case of the recursion is given 
as background knowledge.) Then for any constants n and r there exists a database T)B n , r £ 
2-VB and a declaration T)ec n , r £ 2-VEC, both of sizes polynomial in n, such that 

DNF[ra,r] <j 1-Bepte-2-Clavse'[DB n) , r , Dec n) , r ] 

Hence for any constants a > 2 and d > 1 the language family 

d-BEPTE-2-CiAVSE'[VB, a-VetVeC] 



560 



Pac-Learning Recursive Logic Programs: Negative Results 



is uniformly polynomially predictable only if DNF is polynomially predictable. 

Proof: As before, the proof makes use of a prediction-preserving reducibility from DNF to 
rf-DEPTH-2-CLAUSE'[_D5, Dec] for a specific DB and Dec. Let us assume that <f> is a DNF 
with r terms, and further assume that r = 2 k . (Again, this assumption is made without 
loss of generality, since the number of terms in (f> can be increased by padding with vacuous 
terms.) Now consider a complete binary tree of depth k + 1. The k-th. level of this tree has 
exactly r nodes; let us label these nodes 1, . . . , r, and give the other nodes arbitrary labels. 
Now construct a database DB n r as in Theorem 5, except for the following changes: 



The predicates truei(b,y) and falsei(b,y) also succeed when y is the label of a node at 
some level below k. 



• Rather than the predicate succ, the database contains two predicates leftson and 
rightson that encode the relationship between nodes in the binary tree. 

• The database includes the facts p(ui), ■ ■ ■ , p(u2 r ), where u\, . . . , loi t are the leaves 
of the binary tree. These will be used as the base cases of the recursive program that 
is to be learned. 

Let p be the label of the root of the binary tree. We define the instance mapping to be 

/,■(&!... &i) = (p(p),{biti(b 1 ),...,bit n (b n )}) 

Note that except for the use of p rather than 1, this is identical to the instance mapping 
used in Theorem 5. Also let Dec n , r = (p, 1, R) where R contains the modes biti(-), for i = 
1, . . . , n; truej( + , + ) and false + , +), for j = 1, . . . , r; leftson( + , — ); and rightson( + , —). 
The concept mapping f c ((f>) is the pair of clauses i?i,i?2, where R\ is the clause 

n r s t 

p(Y)^ /\ bit k (X k ) A /\ /\B tJ A leftson(Y,Z)Ap(Z) 

k=l i=l j = l 

and i?2 is the clause 

n r s t 

p(Y)^ /\ bit k (X k ) A /\ f\ Bij A rightson(Y,Z)Ap(Z) 

k=l 8 = 1 j = l 

Note that both of these clause are linear recursive, determinate, and have depth 1. Also, 
the construction is clearly polynomial. It remains to show that membership is preserved. 

Figure 4 shows the space of proofs that can be constructed with the program f c ((f>); as 
in Figure 3, B(i) abbreviates the conjunction f\ biti(Xi) A /\ /\Bij. Notice that the program 
will succeed only if the recursive calls manage to finally recurse to one of the base cases 
p(oJi), . . . , p{u2r)i which correspond to the leaves of the binary tree. Both clauses will both 
succeed on the the first k — 1 levels of the tree. However, to reach the base cases of the 
recursion at the leaves of the tree, the recursion must pass through the &-th level of the tree; 
that is, one of the clauses above must succeed on some node y of the binary tree, where 
y is on the k-th. level of the tree, and hence the label of y is a number between 1 and r. 
The program thus succeeds on fi(i]) precisely when there is some number y between 1 and 
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P(P) 




B(l) p(LL...L) B(l) p(LL...R) B(n) p(RR. . . LR) B(n) p(RR. . . R) 



p(wi) P(^2) p(w 2 r-l) p(^2r) 

Figure 4: Proofs possible with the program f c ((f>) 

r such that the conjunction B(i) succeeds, which (by the argument given in Theorem 5) 
can happen if and only if (f> is satisfied by the assignment r]. Thus, the mappings preserve 
concept membership. This completes the proof. ■ 

Notice that the programs f c ((f>) used in this proof all have the property that the depth 
of every proof is logarithmic in the size of the instances. This means that the hardness 
result holds even if one additionally restricts the class of programs to have a logarithmic 
depth bound. 

4.3 Upper Bounds on the Difficulty of Learning 

The previous sections showed that several highly restricted classes of recursive programs 
are at least as hard to predict as DNF. In this section we will show that these restricted 
classes are also no harder to predict than DNF. 

We will wish to restrict the depth of a proof constructed by a target program. Thus, let 
h(n) be any function; we will use LANG/^) for the set of programs in the class Lang such 
that all proofs of an extended instance (f,D) have depth bounded by /j(||_D||). 
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Theorem 7 Let DNF[ra,*] be the language of DNF boolean functions (with any number 
of terms), and recall that c?-Depth-^-Clause is the language of 2-clause programs con- 
sisting of one clause in c?-DepthLinRec and one clause in c?-DepthNonRec, and that 
c?-Depth-^-Clause' is the language of 2-clause programs consisting of two clauses in 
c?-DepthLinRec. 

For all constants d and a, and all databases DB £ VB and declarations Dec £ a-VetVSC , 
there is a polynomial function poly(n) such that 

• d-BEPTE-2-CiAVSE[DB , Dec] <l B^¥[poly(\\DB\\), *] 

• d-DEPTH-2-CLAUSE^) [DB, Dec] <j DNF[po/?/(||_Di?||), *] if h(n) is bounded by clogra 
for some constant c. 

Hence if either of these language families is uniformly polynomially predictable, then DNF[ra, *] 
is polynomially predictable. 

Proof: The proof relies on several facts established in the companion paper (Cohen, 1995). 

• For every declaration Dec, there is a clause BOTTOM* d (Dec) such that every nonre- 
cursive depth-c? determinate clause C is equivalent to some subclause of BOTTOM * d . 
Further, the size of BOTTOM* d is polynomial in Dec. This means that the lan- 
guage of subclauses of BOTTOM* is a normal form for nonrecursive constant-depth 
determinate clauses. 

• Every linear closed recursive clause Cr that is constant-depth determinate is equiv- 
alent to some subclause of BOTTOM* plus a recursive literal L r ; further, there are 
only a polynomial number of possible recursive literals L r . 

• For any constants a, a', and d, any database DB £ a-VB, any declaration Dec = 
(p, a', R), any database DB £ a-VB, and any program P in rf-DEPTH-2-CLAUSE[_D5, Dec], 
the depth of a terminating proof constructing using P is no more than h max , where 
/i mas is a polynomial in the size of DB and Dec. 

• At can be assumed without loss of generality that the database DB and all decsriptions 
D contain an equality predicate, where an equality predicate is simply a predicate 
equal (X, Y) which is true exactly when X = Y . 

The idea of the proof is to contruct a prediction-preserving reduction between the two 
classes of recursive programs listed above to and DNF. We will begin with two lemmas. 

Lemma 8 Let Dec £ a-VetVSC , and let C be a nonrecursive depth-d determinate clause 
consistent with Dec. Let Subclause^ denote the language of subclauses of C , and let 
Monomial[m] denote the language of monomials over u variables. Then there is a polyno- 
mial poly 1 so that for any database DB £ VB, 

Subclausec[-D-B, Dec] <j Monomial[po/j/ 1 (|| DB I)] 
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Proof of lemma: Follows immediately from the construction used in Theorem 1 of 
Dzeroski, Muggleton, and Russell (Dzeroski et al., 1992). (The basic idea of the construc- 
tion is to introduce a propositional variable representing the "success" of each connected 
chain of literals in C. Any subclause of C can then be represented as a conjunction of these 
propositions.) ■ 

This lemma can be extended as follows. 

Lemma 9 Let Dec £ a-VetVSC , and let S = {C\, . . . , C r } be a set of r nonrecursive depth- 
d determinate clauses consistent with Dec, each of length n or less. Let Subclauses denote 
the set of all programs of the form P = (D\, . . - ,D S ) such that each D{ is a subclause of 
some Cj £ S . 

Then there is a polynomial poly 2 so that for any database DB £ VB, 
Subclauses[_D5, Dec] <j BNF[poly 2 (\\DB\\, r), *] 

Proof of lemma: By Lemma 8, for each C'i £ S , there is a set of variables V{ of size 
polynomial in such that every clause in Subclause^ can be emulated by a monomial 

over V{. Let V = U;=i Vi- Clearly, \V\ is polynomial in n and r, and every clause in 
(Ji Subclause^ can be also emulated by a monomial over V. Further, every disjunction 
of r such clauses can be represented by a disjunction of such monomials. 

Since the C;'s all satisfy a single declaration Dec = (p,a,R), they have heads with the 
same principle function and arity; further, we may assume (without loss of generality, since 
an equality predicate is assumed) that the variables appearing in the heads of these clauses 
are all distinct. Since the C;'s are also nonrecursive, every program P £ Subclauses can 
be represented as a disjunction D\ V . . . V D r where for all i, D{ £ (\J i Subclause^). Hence 
every P £ Subclauses can be represented by an r-term DNF over the set of variables V. 

m 

Let us now introduce some additional notation. If C and D are clauses, then we will use 
C l~l D to denote the result of resolving C and D together, and C n to denote the result of 
resolving C with itself i times. Note that C l~l D is unique if C is linear recursive and C and 
D have the same predicate in their heads (since there will be only one pair of complementary 
literals.) 

Now, consider some target program 

P = (C R ,C B ) e d-BEPTE-2-CiAVSE[DB, Dec] 

where Cr is the recursive clause and C'b is the base. The proof of any extended instance 
(f,D) must use clause Cr repeatedly h times and then use clause Cr to resolve away 
the final subgoal. Hence the nonrecursive clause C\ l~l Cr could also be used to cover the 
instance (/, D). 

Since the depth of any proof for this class of programs is bounded by a number h max 
that is polynomial in ||-D-B|| and ra e , the nonrecursive program 

p' = {^nc B :oa< h max } 
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is equivalent to P on extended instances of size n e or less. 

Finally, recall that we can assume that C'b is a subclause of BOTTOM d ; also, there 
is a polynomial-sized set T-jz = L ri , . . . , T Tp of closed recursive literals such that for some 
T Tl G Lfc, the clause Cr is a subclause of BOTTOM * d U L Ti . This means that if we let S 
be the polynomial-sized set 

S\ = {(BOTTOMS U T ri f n BOTTOMS \ 0<h< h max and Z rj G i K } 

then P' G Subclauses^ Thus by Lemma 9, c?-Depth-2-Clause <l Dnf. This concludes 
the proof of the first statement in the the theorem. 
To show that 

rf-DEPTH-2-CLAUSE^ (n) [PP,Pec] < DNF[po/?/(||PP||,*] 

a similar argument applies. Let us again introduce some notation, and define 
MESH/j^Crj , Cr 2 ) as the set of all clauses of the form 

Cr, a n C Rit2 n . . . n C Ri h , 

where for all j, Cr i} = Cr x or Cr i} = Cr 2 , and h! < h(n). Notice that for functions 
h(n) < clogra the number of such clauses is polynomial in n. 

Now let p be the predicate appearing in the heads of Cr\ and Cr 2 , and let C (respectively 
DB) be a a version of C (DB) in which every instance of the predicate p has been replaced 
with a new predicate p. If P is a recursive program P = {Cr 15 Cr 2 } in c?-Depth-2- Clause' 
over the database DB, then P A DB is equivalent 4 to the nonrecursive program P' A DB, 
where 

P' = {C \C eMESE h , ne (C Rl ,C R2 )} 

Now recall that there are a polynomial number of recursive literals T Tt , and hence a 
polynomial number of pairs of recursive literals T Tt , T Tj . This means that the set of clauses 

S 2 = (J {C | C G MESK hine (BOTTOM* d \JL ri ,BOTTOM* d \JL rj )} 

(Lr t ,L r j )ELjzXLjz 

is also polynomial-sized; furthermore, for any program P in the language c?-Depth-2-Clause, 
P' G SuBCLAUSEs 2 . The second part of the theorem now follows by application of Lemma 9. 
■ 

An immediate corollary of this result is that Theorems 6 and 5 can be strengthened as 
follows. 

Corollary 10 For all constants d > 1 and a > 2, the language family 

d-DEPTH-2-CLAUSE[£>£, a-VetVSC] 

is uniformly polynomially predictable if and only if DNF is polynomially predictable. 
For all constants d > 1 and a > 2, the language family 

d-BEPTE-2-CiAVSE'[VB, a-VetVeC] 

is uniformly polynomially predictable if and only if DNF is polynomially predictable. 

4. On extended instances of size n e or less. 
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Thus in an important sense these learning problems are equivalent to learning boolean 
DNF. This does not resolve the questions of the learnability of these languages, but does 
show that their learnability is a difficult formal problem: the predictability of boolean DNF 
is a long-standing open problem in computational learning theory. 

5. Related Work 

The work described in this paper differs from previous formal work on learning logic pro- 
grams in simultaneously allowing background knowledge, function-free programs, and recur- 
sion. We have also focused exclusively on computational limitations on efficient learnability 
that are associated with recursion, as we have considered only languages known to be pac- 
learnable in the nonrecursive case. Since the results of this paper are all negative, we have 
concentrated on the model of polynomial predictability; negative results in this model im- 
mediately imply a negative result in the stronger model of pac-learnability, and also imply 
negative results for all strictly more expressive languages. 

Among the most closely related prior results are the negative results we have previously 
obtained for certain classes of nonrecursive function-free logic programs (Cohen, f993b). 
These results are similar in character to the results described here, but apply to nonrecursive 
languages. Similar cryptographic results have been obtained by Frazier and Page (1993) for 
certain classes of programs (both recursive and nonrecursive) that contain function symbols 
but disallow background knowledge. 

Some prior negative results have also been obtained on the learnability of other first- 
order languages using the proof technique of consistency hardness (Pitt & Valiant, 1988). 
Haussler (1989) showed that the language of "existential conjunction concepts" is not pac- 
learnable by showing that it can be hard to find a concept in the language consistent with a 
given set of examples. Similar results have also been obtained for two restricted languages 
of Horn clauses (Kietz, 1993); a simple description logic (Cohen & Hirsh, 1994); and for the 
language of sorted first-order terms (Page & Frisch, 1992). All of these results, however, are 
specific to the model pac-learnability, and none can be easily extended to the polynomial 
predictability model considered here. The results also do not extend to languages more 
expressive than these specific constrained languages. Finally, none of these languages allow 
recursion. 

To our knowledge, there are no other negative learnability results for first-order lan- 
guages. A discussion of prior positive learnability results for first-order languages can be 
found in the companion paper (Cohen, 1995). 

6. Summary 

This paper and its companion (Cohen, 1995) have considered a large number of different 
subsets of Datalog. Our aim has been to be not comprehensive, but systematic: in particu- 
lar, we wished to find precisely where the boundaries of learnability lie as various syntactic 
restrictions are imposed and relaxed. Since it is all too easy for a reader to "miss the forest 
for the trees", we will now briefly summarize the results contained in this paper, together 
with the positive results of the companion paper (Cohen, 1995). 
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Local 
Clauses 


Constant-Depth Determinate 
Clauses 


nLj R 
kC R 




nC R 


nC R \C~ B 


tiCr, C b k x nC R n x nC R 




kC^ 


kC R \C+ 


kCj^^ C ^ k x k C ft x kC 






ic R \c+ 


lC R ,C§ Dm 2xlC]f NF nxlC R 











Table 1: A summary of the learnability results 

Throughout these papers, we have assumed that a polynomial amount of background 
knowledge exists; that the programs being learned contain no function symbols; and that 
literals in the body of a clause have small arity. We have also assumed that recursion is 
closed, meaning that no output variables appear in a recursive clause; however, we believe 
that this restriction can be relaxed without fundamentally changing the results of the paper. 

In the companion paper (Cohen, 1995) we showed that a single nonrecursive constant- 
depth determinate clause was learnable in the strong model of identification from equivalence 
queries. In this learning model, one is given access to an oracle for counterexamples — that 
is, an oracle that will find, in unit time, an example on which the current hypothesis is 
incorrect — and must reconstruct the target program exactly from a polynomial number of 
these counterexamples. This result implies that a single nonrecursive constant-depth deter- 
minate clause is pac-learnable (as the counterexample oracle can be emulated by drawing 
random examples in the pac setting). The result is not novel (Dzeroski et al., 1992); however 
the proof given is independent, and is also of independent interest. Notably, it is somewhat 
more rigorous than earlier proofs, and also proves the result directly, rather than via reduc- 
tion to a propositional learning problem. The proof also introduces a simple version of the 
forced simulation technique, variants of which are used in all of the positive results. 

We then showed that the learning algorithm for nonrecursive clauses can be extended 
to the case of a single linear recursive constant-depth determinate clause, leading to the 
result that this restricted class of recursive programs is also identifiable from equivalence 
queries. With a bit more effort, this algorithm can be further extended to learn a single 
k-Suiy recursive constant-depth determinate clause. 

We also considered extended the learning algorithm to learn recursive programs consist- 
ing of more than one constant-depth determinate clauses. The most interesting extension 
was to simultaneously learn a recursive clause Cr and a base clause Cr, using equivalence 
queries and also a "basecase oracle" that indicates which counterexamples should be covered 
by the base clause Cr- In this model, it is possible to simultaneously learn a recursive clause 
and a nonrecursive base case in all of the situations for which a recursive clause is learned 
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Language Family 



B R L/R Oracles 



Notation Learnable 



g?-DepthNonRec [a-VB, a-VeW£C] 
g?-DepthLinRec [a-T>B, a-VeW£C] 
d-T)EPTR-k-HEc[a-T>B , a-VeW£C] 
rf-DEPTH-2-CLAUSE[a-X»S, a-VetV£C] 
fcrf-MAxREcLANG [a-T>B, a-VeWSC] 
rf-DEPTH-2-CLAUSE[a-X»S, a-VeW£C] 
d-DEPTR-2-CLAVSE'[a-VB,a-VetV£C] 
g?-DepthLinRecProg[gi-X>,8, a-T> etD£C] 
d-BEPTRREc[a-VB, a-VeWSC] 
A;-LocALLiNREc[a-X'S, a-V£C] 



10- EQ 
Oil EQ 
1k EQ 



111 EQ 

2 1 EQ 

n 1 EQ 

1 n EQ 

Oil EQ 



1 1 1 EQ,BASE 
1 1 k EQ,BASE 



C B yes 

ICr yes 

kC R yes 

1C R \C B yes 

kC R \C B yes 



IC R , C b =dnf 
2 x \C R =DNF 



n x ICr no 
uCr no 
ICr no 



Table 2: Summary by language of the learnability results. Column B indicates the number 



of base (nonrecursive) clauses allowed in a program; column R indicates the num- 
ber of recursive clauses; L/R indicates the number of recursive literals allowed in 
a single recursive clause; EQ indicates an oracle for equivalence queries and BASE 
indicates a basecase oracle. For all languages except £;-LocalLinRec, all clauses 
must be determinate and of depth d. 



alone; for instance, one can learn a A;-ary recursive clause to together with its nonrecursive 
base case. This was our strongest positive result. 

These results are summarized in Tables 1 and 2. In Table 1, a program with one r- 
ary recursive clause is denoted tCr, a program with one r-ary recursive clause and one 
nonrecursive basecase is denoted rCR,Cs, or t Cr\Cb if there is a "basecase" oracle, and 
a program with s different r-ary recursive clauses is denoted s X tCr. The boxed results 
are associated with one or more theorems from this paper, or its companion paper, and 
the unmarked results are corollaries of other results. A "+" after a program class indicates 
that it is identifiable from equivalence queries; thus the positive results described above are 
summarized by the four "+" entries in the lower left-hand corner of the section of the table 
concerned with constant-depth determinate clauses. 

Table 2 presents the same information in a slightly different format, and also relates the 
notation of Table 1 to the terminology used elsewhere in the paper. 

This paper has considered the learnability of the various natural generalizations of the 
languages shown to be learnable in the companion paper. Consider for the moment single 
clauses. The companion paper showed that for any fixed k a single A;-ary recursive constant- 
depth determinate clause is learnable. Here we showed that all of these restrictions are 
necessary. In particular, a program of n constant-depth linear recursive clauses is not 
polynomially predictable; hence the restriction to a single clause is necessary. Also, a single 
clause with n recursive calls is hard to learn; hence the restriction to A;-ary recursion is 
necessary. We also showed that the restriction to constant-depth determinate clauses is 
necessary, by considering the learnability of constant locality clauses. Constant locality 
clauses are the only known generalization of constant-depth determinate clauses that are 
pac-learnable in the nonrecursive case. However, we showed that if recursion is allowed, 
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then this language is not learnable: even a single linear recursive clause is not polynomially 
predictable. 

Again, these results are summarized in Table 1; a " — " after a program class means that 
it is not polynomially predictable, under cryptographic assumptions, and hence neither 
pac-learnable nor identifiable from equivalence queries. 

The negative results based on cryptographic hardness give an upper bound on the ex- 
pressiveness of learnable recursive languages, but still leave open the learnability of programs 
with a constant number of k-ary recursive clauses in the absence of a basecase oracle. In 
the final section of this paper, we showed that the following problems are, in the model of 
polynomial predictability, equivalent to predicting boolean DNF: 

• predicting two-clause constant-depth determinate recursive programs containing one 
linear recursive clause and one base case; 

• predicting two-clause recursive constant-depth determinate programs containing two 
linear recursive clauses, even if the base case is known. 

We note that these program classes are the very nearly the simplest classes of multi-clause 
recursive programs that one can imagine, and that the pac-learnability of DNF is a long- 
standing open problem in computational learning theory. These results suggest, therefore, 
that pac-learning multi-clause recursive logic programs is difficult; at the very least, they 
show that finding a provably correct pac-learning algorithm will require substantial advances 
in computational learning theory. In Table 1, a "= Dnf" (respectively > Dnf) means that 
the corresponding language is prediction-equivalent to DNF (respectively at least as hard 
as DNF). 

To further summarize Table 1: with any sort of recursion, only programs containing 
constant-depth determinate clauses are learnable. The only constant-depth determinate 
recursive programs that are learnable are those that contain a single k-ary recursive clause 
(in the standard equivalence query model) or a single k-ary recursive clause plus a base 
case (if a "basecase oracle" is allowed). All other classes recursive programs are either 
cryptographically hard, or as hard as boolean DNF. 

7. Conclusions 

Inductive logic programming is an active area of research, and one broad class of learning 
problems considered in this area is the class of "automatic logic programming" problems. 
Prototypical examples of this genre of problems are learning to append two lists, or to 
multiply two numbers. Most target concepts in automatic logic programming are recursive 
programs, and often, the training data for the learning system are simply examples of the 
target concept, together with suitable background knowledge. 

The topic of this paper is the pac-learnability of recursive logic programs from random 
examples and background knowledge; specifically, we wished to establish the computational 
limitations inherit in performing this task. We began with some positive results established 
in a companion paper. These results show that one constant-depth determinate closed k-ary 
recursive clause is pac-learnable, and that further, a program consisting of one such recursive 
clause and one constant-depth determinate nonrecursive clause is also pac-learnable given 
an additional "basecase oracle". 
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In this paper we showed that these positive results are not likely to be improved. In 
particular, we showed that either eliminating the basecase oracle or learning two recur- 
sive clauses simultaneously is prediction-equivalent to learning DNF, even in the case of 
linear recursion. We also showed that the following problems are as hard as breaking (pre- 
sumably) secure cryptographic codes: pac-learning n linear recursive determinate clauses, 
pac-learning one ra-ary recursive determinate clause, or pac-learning one linear recursive 
A;-local clause. 

These results contribute to machine learning in several ways. From the point of view 
of computational learning theory, several results are technically interesting. One is the 
prediction-equivalence of several classes of restricted logic programs and boolean DNF; this 
result, together with others like it (Cohen, 1993b), reinforces the importance of the learn- 
ability problem for DNF. This paper also gives a dramatic example of how adding recursion 
can have widely differing effects on learnability: while constant-depth determinate clauses 
remain pac-learnable when linear recursion is added, constant-locality clauses become cryp- 
tographically hard. 

Our negative results show that systems which apparently learn a larger class of recursive 
programs must be taking advantage either of some special properties of the target concepts 
they learn, or of the distribution of examples that they are provided with. We believe that 
the most likely opportunity for obtaining further positive formal results in this area is to 
identify and analyze these special properties. For example, in many examples in which 
FOIL has learned recursive logic programs, it has made use of "complete example sets" — 
datasets containing all examples of or below a certain size, rather than sets of randomly 
selected examples (Quinlan & Cameron- Jones, 1993). It is possible that complete datasets 
allow a more expressive class of programs to be learned than random datasets; in fact, some 
progress has been recently made toward formalizing this conjecture (De Raedt & Dzeroski, 
1994). 

Finally, and most importantly, this paper has established the boundaries of learnability 
for determinate recursive programs in the pac-learnability model. In many plausible auto- 
matic programming contexts it would be highly desirable to have a system that offered some 
formal guarantees of correctness. The results of this paper provide upper bounds on what 
one can hope to achieve with an efficient, formally justified system that learns recursive 
programs from random examples alone. 
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